Regression Discontinuity

Regression discontinuity

  • “A localized experiment at the cutoff point”

Regression discontinuity

  • Hard to distinguish effects of specific policies from variables that cause those policies to be implemented.

  • But arbitrary cutoffs are extremely common in policy: “need” may be a complex latent characteristic that is nearly impossible to measure, but “is your household income less than 130% of the federal poverty line?” is just a check box.

Health insurance rates by age

Birth date and age at time of entering first grade

McEwan, P. J., & Shapiro, J. S. (2008). The benefits of delayed primary school enrollment: Discontinuity estimates using exact birth dates. Journal of human Resources, 43(1), 1-29.

McEwan, P. J., & Shapiro, J. S. (2008). The benefits of delayed primary school enrollment: Discontinuity estimates using exact birth dates. Journal of human Resources, 43(1), 1-29.

Discontinuity assumptions

  • Characteristics like age aren’t randomly distributed, but they have a random component.
  • As you get closer to a hard cutoff, “randomness” accounts for a proportionally larger share of the difference in outcomes: birth year is fairly non random, but being born at 11:59 PM on February 4th is not really different from being born on February 5th at 12:01 AM.
  • Depending on how much noise there is, randomness could account for 100% of the difference within a window near the discontinuity.
  • The causal statement here is: “but for [the cutoff] there would be a continuous line through different values of X”

RDD essentials

  • We need a running variable and a discontinuity/cutoff. The running variable must be ordinal/continuous

  • The cutoff should impact our outcome only through its impact on the IV of interest

    • This can be problematic for things like age and income because you qualify for multiple things at age 65 or x% of the federal poverty limit
  • Units near the cutoff should be as similar as possible

    • They won’t be identical by definition, but you can control for the “forcing” characteristic.
  • The amount of random variation shouldn’t suddenly increase near the cutoff

Example: Carpenter and Dobkin

  • Carpenter Dobkin (2011): does setting the drinking age at 21 accomplish anything?

Observational research suggests compliance is quite low.

But this is a federal law, so we don’t have a reasonable comparison group.

Minimum drinking age

However, the age cutoff itself could be a source of a discontinuity.

Minimum drinking age

Minimum drinking age

The simplest way to estimate the effect is a linear model with a control for the running variable and an indicator for the treatment. (we can also center the “age” at 21 to simplify the interpretation

DV = all cause mortality.
model 1
constant 91.841
[90.220, 93.463]
age −0.975
[−2.249, 0.299]
over 21? 7.663
[4.762, 10.564]
Num.Obs. 48
R2 0.595
Age centered at 0 = 21

That said, this probably isn’t the best method because the slopes are different before and after the cutoff.

Minimum drinking age

Using an interaction term allows the slope of the line to be different before and after the cutoff.

DV = all cause mortality.
model 1  model 2
constant 91.841 93.618
[90.220, 93.463] [91.739, 95.498]
age −0.975 0.827
[−2.249, 0.299] [−0.823, 2.477]
over 21? 7.663 7.663
[4.762, 10.564] [5.005, 10.320]
age x over 21 −3.603
[−5.937, −1.269]
Num.Obs. 48 48
R2 0.595 0.668
Age centered at 0 = 21

Minimum drinking age

  • Assuming we’ve got the right model here, hitting the minimum legal drinking age results in about 8 more deaths per 100,000 people. But the negative slope after 21 might indicate the effect wears off and then *reverses*

Minimum drinking age

However, the dis-aggregated results suggest that this isn’t the case: its more likely there are two separate trends.

Non-linearity

One important consideration here is accounting for non-linearity. Properly controlling for the effect of the running variable requires us to get the functional form right.

Assuming a linear effect when the effect is cubic will produce a spurious result. Note that there appears to be a discontinuity here when using a linear model.

Non-linearity

  • Adding polynomials (squared, cubed, etc versions of the running variable) is one option, but

    • can result in over-fitting

    • still can’t approximate certain kinds of non-linearity

Non-linearity: local smoothing

Methods such as LOESS operate by estimating weighted polynomial regressions on a sliding “window” (usually called the bandwidth) of data points and then smoothing that result using a kernel function*.

All else equal, a smaller window gives greater weight to individual points, while a larger window will result in a smoother line.

*as you might guess, this is related to the methods that generate kernel density plots

Non-linearity: local smoothing

  • Local smoothing is typically not a great tool for hypothesis testing because it doesn’t really tell you an “effect” but in the discontinuity case, we really only care about the difference around the cutoff.
  • The choice of a bandwidth and kernel function adds some additional complexity, but there are some ways to estimate an optimal window.

Example: Incumbency effects

Is the incumbency advantage real? How could we estimate it? Are there reasons to doubt it exists?

Vote share and winning a two-party election

Obviously, there’s a built-in discontinuity here!

Incumbency effects

Incumbency effects

The relationship is a bit easier to spot if we focus on observations near the cutoff point, but still messy, and good reason to suspect the effect is non-linear

Incumbency effects

Vote share and winning a two-party election

Using rdrobust to estimate an optimal bin size:

library(rdrobust)

out <- rdplot(data$demshare_t2, 
              data$dem_margin,  p = 3, 
              kernel ='uniform',
               x.label = 'Democratic margin at time 1', 
              y.label = 'Dem voteshare at time 2', title = '', hide=TRUE)
out$rdplot

Vote share and winning a two-party election

out <- rdrobust(data$demshare_t2, 
                data$dem_margin, 
                kernel = 'uniform',  
                p = 3, bwselect = 'mserd')

data.frame(coef=out$coef, se=out$se, z=out$z, ci=out$ci)|>
  slice_head(n=1)
CoeffStd..Err.zci.CI.Lowerci.CI.Upper
9.552.214.315.2113.9

Testing assumptions near the cutoff

A simple test for the assumptions for RDD is to set a different cutoff and see if the results are still significant.

Testing assumptions near the cutoff

CoeffStd..Err.zci.CI.Lowerci.CI.Upper
2.122.290.927-2.366.6

Vote share as a discontinuity

Vote share is the most common design for political scientists

What are we estimating here?

  • Technically, we’re estimating a local average treatment effect near the cutoff. This only matches the average treatment effect if the effects are similar across all values of the running variable. So consider:

    • People near an income threshold undoubtedly benefit more from income assistance compared to people far away

    • Swing districts may be more economically volatile than more partisan ones.

    • People around age 21 may do riskier stuff when they drink compared to older people.

Whether this matters is partly a question of interpretation.

Fuzzy and sharp boundaries

  • Plurality elections impose a sharp boundary, but other boundaries may be “soft”

    • Not everyone who is eligible for food stamps will enroll

    • Some elected officials resign or never take their seats

    • The thresholds themselves may be measured with error (blood alcohol content, registration, place of residence)

  • This could be considered similar to the problem of “non-compliance”: some units should be treated, but aren’t. Methods are similar: estimate a 2 stage regression model to adjust for non-compliance

Others

Hopkins (2004): county threshold for language minorities and ballot assistance

Posner: Regression discontinuity and cultural cleavage

Considerations

  • Are observations near the cutoff fundamentally unlike observations elsewhere?

  • Is it possible some cases are changing their behavior in response to being “near” the cutoff?

  • Do small changes to the method for fitting the running variable cause big changes in the estimated effect? If so, the results might be spurious.

  • Does a placebo test give null results?

    • Testing “fake” cutoffs shortly before or after the real one

    • Checking for discontinuities on other variables that seem like they shouldn’t be responsive to the treatment (i.e. do homicides also spiking around 21 might indicate a different process)